2016 Presidential Election Contributions - California

Huey Kwik

Univariate Plots

## [1] 1072770      25
## Classes 'tbl_df', 'tbl' and 'data.frame':    1072770 obs. of  25 variables:
##  $ cmte_id          : chr  "C00575795" "C00575795" "C00575795" "C00577130" ...
##  $ cand_id          : chr  "P00003392" "P00003392" "P00003392" "P60007168" ...
##  $ cand_nm          : Factor w/ 25 levels "Bush, Jeb","Carson, Benjamin S.",..: 4 4 4 20 20 20 20 4 20 20 ...
##  $ contbr_nm        : chr  "AULL, ANNE" "CARROLL, MARYJEAN" "GANDARA, DESIREE" "LEE, ALAN" ...
##  $ contbr_city      : chr  "LARKSPUR" "CAMBRIA" "FONTANA" "CAMARILLO" ...
##  $ contbr_st        : chr  "CA" "CA" "CA" "CA" ...
##  $ contbr_zip       : int  949391913 934284638 923371507 930111214 902784310 902784310 920842849 926372912 926833846 949522729 ...
##  $ contbr_employer  : chr  "N/A" "N/A" "N/A" "AT&T GOVERNMENT SOLUTIONS" ...
##  $ contbr_occupation: chr  "RETIRED" "RETIRED" "RETIRED" "SOFTWARE ENGINEER" ...
##  $ contb_receipt_amt: num  50 200 5 40 35 100 25 40 10 15 ...
##  $ contb_receipt_dt : Date, format: "2016-04-26" "2016-04-20" ...
##  $ receipt_desc     : chr  NA NA NA NA ...
##  $ memo_cd          : chr  "X" "X" "X" NA ...
##  $ memo_text        : chr  "* HILLARY VICTORY FUND" "* HILLARY VICTORY FUND" "* HILLARY VICTORY FUND" "* EARMARKED CONTRIBUTION: SEE BELOW" ...
##  $ form_tp          : chr  "SA18" "SA18" "SA18" "SA17A" ...
##  $ file_num         : int  1091718 1091718 1091718 1077404 1077404 1077404 1077404 1091718 1077404 1077404 ...
##  $ tran_id          : chr  "C4768722" "C4747242" "C4666603" "VPF7BKWA097" ...
##  $ election_tp      : Factor w/ 3 levels "G2016","P2016",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ cand_last_name   : Factor w/ 25 levels "Bush","Carson",..: 4 4 4 20 20 20 20 4 20 20 ...
##  $ party            : Ord.factor w/ 5 levels "Democratic"<"Republican"<..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ zip              : chr  "94939" "93428" "92337" "93011" ...
##  $ city             : chr  "Larkspur" "Cambria" "Fontana" "Camarillo" ...
##  $ state            : chr  "CA" "CA" "CA" "CA" ...
##  $ latitude         : num  37.9 35.6 34 34 33.9 ...
##  $ longitude        : num  -123 -121 -117 -119 -118 ...

Distribution of Contribution Amounts

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
## -10000.0     15.0     27.0    122.7    100.0  10800.0        1
## [1] "Number of negative contributions: 10617"

Observations:

  • I used log(count) so we could more easily see the data on a chart.
  • Why are there negative contributions?
  • Range of contributions range from -10000 to 10800
  • There is one missing value for contb_receipt_amt in the dataset.
  • Most contributions seem to be smaller
  • Fewer than 1% of the contributions are negative, so it’s probably okay to continue analysis and just note this. However, still curious about why.
  • Isn’t there a $2700 contribution limit for individuals? We could check if contributions greater than $2700 also have negative contributions.

Double Counting Contributions?

## # A tibble: 8 × 5
##            contbr_nm contb_receipt_dt contb_receipt_amt        tran_id
##                <chr>           <date>             <dbl>          <chr>
## 1    HOROWITZ, DAVID       2015-07-06             10800   SA17A.123258
## 2    HOROWITZ, DAVID       2015-07-06             -5400 SA17A.123258.0
## 3 HOROWITZ, MICHELLE       2015-07-06              5400 SA17A.123258.1
## 4 HOROWITZ, MICHELLE       2015-07-06             -2700 SA17A.123258.2
## 5 HOROWITZ, MICHELLE       2015-07-06              2700 SA17A.123258.3
## 6    HOROWITZ, DAVID       2015-07-06             -2700 SA17A.123258.4
## 7    HOROWITZ, DAVID       2015-07-06              2700 SA17A.123258.5
## 8 HOROWITZ, MICHELLE       2015-07-06              5400   SA17A.123263
## # ... with 1 more variables: election_tp <fctr>

I looked at some examples of contributions that were above $2700 and came across David and Michelle Horowitz. They appear to be a couple who donated to Scott Walker’s campaign.

Summing up contb_receipt_amt, we get $10,800. Is this an instance of people contributing over the limit?

From what I can tell, this instead seems to be double-counting! The FEC provides an Individual Contributor Search, which lets us look at each contributor record in more detail.

From there, I was able to piece this story:

  • On 7/6/2015, David Horowitz donates a total of $10,800 to Scott Walker’s primary campaign.
  • Of that $10,800, he reattributes $5,400 to Michelle Horowitz.
  • He redesignates $2,700 to Walker’s General Election campaign. This money gets refunded on 11/16/2015.
  • The $5,400 seems appears twice, in the 3rd and 8th row of the table.
  • Assuming that those records are duplicates, then Michelle Horowitz reattributes $2,700 to Walker’s General Election campaign.

If this story is true, then these donations are within the campaign contribution limits for primary and general elections. From an election integrity standpoint, this is good.

However, when doing analysis of this data, we should be aware of this discrepancy in our analysis. A contribution like Michelle Horowitz’s reattributed $5,400 may be double-counted in our analysis. Also, a large contribution of $10,800 by David Horowitz will count towards calculating the mean, even though it gets reattributed into smaller contributions later.

Party

Democrats had the most contributions by far, which makes sense in California.

Contribution Locations

As you can see, there are contributions from outside of California.

## # A tibble: 139 × 5
##    state   zip       city contbr_city contbr_st
##    <chr> <chr>      <chr>       <chr>     <chr>
## 1     NV 89411      Genoa       GENOA        CA
## 2     HI 96743    Kamuela  SANTA YNEZ        CA
## 3     WY 82717   Gillette    GILLETTE        CA
## 4     UT 84096   Herriman    HERRIMAN        CA
## 5     AP 96349        Fpo     FPO  AP        CA
## 6     AP 96260        Apo         APO        CA
## 7     HI 96737 Ocean View    BLUE JAY        CA
## 8     OR 97209   Portland    PORTLAND        CA
## 9     NV 89052  Henderson   HENDERSON        CA
## 10    HI 96743    Kamuela  SANTA YNEZ        CA
## # ... with 129 more rows

Let’s restrict our visualization to known California zipcodes:

It seems like most of the contributions are centered around the major cities in California: Los Angeles, San Francisco, San Diego, and Sacramento.

Occupations, Employers

Above, we look at the top 10 occupations and employers in our dataset.

As we can see, retirees make up a large chunk of our dataset, as do the self-employed.

Univariate Analysis

What is the structure of your dataset?

There are 1,073,271 records in the dataset with 18 features.

The features are as follows:

  • committee id
  • candidate id, name
  • contributor name, city, state, zipcode
  • contributor’s employer, occupation
  • contribution amount
  • contribution date
  • receipt description
  • memo code
  • memo text
  • form type
  • file number
  • transaction id
  • election type

Factors: Candidate name, election type (Primary 2016, General 2016, or Primary 2020)

Other observations:

  • Date range: April 1, 2015 to October 31, 2016
  • There are 60587 unique employers.
  • There are 136234 unique zipcodes.
  • There are 2418 unique cities.
  • There are 209688 unique contributors.
  • There are 26654 unique occupations.
  • There are seven donations designated for Primary 2020. They are all redesignations from general election contributions to Lindsey Graham.

What is/are the main feature(s) of interest in your dataset?

I’m mostly interested in looking at patterns/differences in contributions among different candidates. So for me, the main features of interest are candidate name, contribution amount, date, and location.

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

Zipcode, employers, occupation, and party might provide other angles into the data.

Did you create any new variables from existing variables in the dataset?

I created a variable to represent each candidate’s political party.

In order to get geospatial information, I merged in data from the zipcode dataset, using the contbr_zipcode as the key. This merged in latitude and longitude information.

Finally, I was curious how donations correlated with votes, so I added in the primary vote totals and delegates, which I found on Wikipedia.

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

When histogramming the contribution amounts, I used a log scale since one of the bins was really large. This made it easier to see the rest of the data.

Bivariate Plots

Contributions per Candidate

##  contb_receipt_amt 
##  Min.   :-5700.00  
##  1st Qu.:   15.00  
##  Median :   27.00  
##  Mean   :   50.57  
##  3rd Qu.:   50.00  
##  Max.   :10000.00
##  contb_receipt_amt
##  Min.   :-5400.0  
##  1st Qu.:   15.0  
##  Median :   25.0  
##  Mean   :  146.3  
##  3rd Qu.:  100.0  
##  Max.   : 7300.0  
##  NA's   :1

Since Sanders is often portrayed as the more progressive, blue-collar candidate, it is interesting to see that Clinton’s median donation is actually lower. It is interesting that Clinton and Sanders average donation amounts are roughly the same. Clinton’s median donation is actually lower, i.e. $27 vs. $25. Of course, this data does not include donations to Political Action Committees, so that could be a factor.

For the box plots, I sorted the candidates from highest number of donations to lowest.

In the first box plot, we can see that some candidates actually have many donations above the individual limit of $2700. Many also have negative donations, which could either be refunds or reattributions.

In the second box plot, I excluded negative contributions to see if we could see any other patterns.

Life of a Campaign

Observations:

  • Campaign donations start in early 2015 but really pick up in 2016, which coincides with primary season.
  • The Republican field was crowded. Trump had a huge spike in donations in the middle of 2016, perhaps when it was clear he was going to win the nomination.
  • Sanders vs. Clinton is interesting to look at. It looks like Sanders actually gets more donations during mcuh of the primary season. The decline in donations also corresponds to roughly when it becomes apparent that Sanders will lose the nomination.
  • Evan McMullin’s donations picked up in the final month of the campaign.

Donation vs. Last Donation Date

Last donation date could be a proxy variable for how long a campaign lasts. As we can see, this is positively correlated with the total number of donations.

Because campaigns can still receive donations after the campaign has been “suspended”, last donation date by itself isn’t a good indicator of when a campaign ends. For that, it’s better to look at a histogram of dates.

Total Amount vs. Number of Donations

## 
##  Pearson's product-moment correlation
## 
## data:  total and n
## t = 12.959, df = 23, p-value = 4.697e-12
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8622126 0.9725650
## sample estimates:
##       cor 
## 0.9378353

Here we look at a scatter log-log plot of total amount raised vs number of donations. It’s clear that there is a positive correlation here (Pearson correlation of 0.94). This makes sense intuitively, especially when you consider that individual campaign contributions are capped at $2,700. This requires many small donations in order to raise a lot of money.

Delegate Count vs. Number of Donations

Only a few candidates will actually receive any delegates, so it’s hard to make claims about the relationship between number of donations and delegate count. Perhaps it’s exponential.

Votes vs. Number of Donations

This is on a log-log scale. There seems to be a positive correlation here.

Contributions by Party and Location

This visualization again shows concentration of donation activity around cities, but also shows more domination by Democrats.

Party vs. Occupation

The top chart shows the top ten occupations for Democrats. The bottom chart shows the top ten occupations for Republicans.

Appears on both top ten lists:

  • Retired, the top occupation type for both.
  • Attorney
  • Teacher
  • Info Requested
  • Engineer
  • Physician

Just Democrats:

  • Software Engineer
  • Professor
  • Consultant

Just Republicans:

  • Homemaker
  • Sales
  • Self-Employed

Party vs. Employer

Nothing interesting when comparing party vs. employer.

Candidate vs. Occupation

Here I looked at the top four candidates in the primaries: Clinton, Sanders, Trump, and Cruz. The charts appear in that order from left-to-right.

Each chart shows the top 10 occupations.

One thing that stood out to me is the percentage of donations that came from retirees. For Clinton, Trump, and Cruz, retirees make up more than 60% of donations. For Sanders, this is less than 10%. Instead, 60% of his donors are listed as “Not Employed.”

Bivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

Number of contributions and total contributions are positively correlated.

Both of these features are positively correlated with number of votes and number of delegates in the primary election.

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

What was the strongest relationship you found?

Tracking the donations through time can give us a sense of the story of the campaign.

For Democrats, the number of donations seems to roughly track what is going on in the campaign. Sanders gains a lot of interest throughout the campaign, peaks, and then declines as it becomes clearer he will not win the nomination. Clinton’s donations rise after the convention and throughout

Multivariate Plots Section

Map View Donations

Each point on the map represents a zipcode.

The first chart shows number of donations. The second charts hows total donated.

These two charts look similar, but when we compare them, it seems like Democratic money is more tightly concentrated around cities in the second chart than the first.

Tried faceting by party to see if anything stood out, but I don’t think this set of visualizations showed much more than the previous set.

Where are Clinton’s Supporters? Trump’s Supporters?

Similar to the differences between Republicans and Democrats, Clinton’s support heavily draws from urban areas. Trump’s support appears to be more evenly split.

Zoom into San Francisco Bay Area

Zooming in on the Bay Area, and it looks like Democrats get a lot of their donations from urban cities than Republicans do.

Observations:

  • Some of the points appear to be in the water. Perhaps there are discrepancies between the lat/long coordinates we get from the zipcode dataset and that from map_data.
  • Trump has relatively more support on the perimeter of San Francisco than he does in the denser areas.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

The Democratic and Republican parties receive donations from similar areas, with the Democrats receiving more donations from more densely populated areas.

Were there any interesting or surprising interactions between features?

I expected to see some difference between Clinton and Sanders support geographically but they were largely the same.


Final Plots and Summary

Plot One

Description One

Each point on the map represents a zipcode. The size of the point representing the number of donations for zipcode. The color of the point represents the political party.

Over one million donations are visualized on this map. It’s clear that both parties draw support from more populated areas, but the Democrats especially draw support from urban cities.

Plot Two

Description Two

This chart shows the battle between Hillary Clinton and Bernie Sanders using number donations.

We can see that Sanders peak around late March and early April of 2016, where he wins nine out of ten contests over clinton.

Donations decline as June 7th approaches, when Clinton clinches the nomination.

The Democratic National Convention was from July 25th to July 28th in 2016, where we see Clinton’s donation types switch from Primary to General.

Plot Three

Description Three

This chart shows the Top 10 Occupations for each Candidate. I chose Clinton, Sanders, Trump, and Cruz because they were the top two candidates for their respective primaries.

Retirees make up the bulk of donations for each candidate except for Sanders, who drew a lot of his support from those listed as “Not Employed.”


Reflection

This data set contains information on more than 1 million donations to the 2016 Presidential election campaigns in California. I started by understanding the individual variables in the data set, and then I explored interesting questions and leads as I continued to make observations on plots.

Visualizations of the data helped me spot problems in the data to fix. For instance, I needed to merge variants of the phrase “Self Employed” (like “Self-Employed” or “Self”).

Given that California is a solid Democratic supporting state, it was hard to tease out big differences between Republican and Democratic donations. I chose California because of my familiarity with the state, but looking at a swing state like Ohio or Pennsylvania may yield more interesting results.

Finally, I wish there were more information about each of the donors so I could do more analysis. It’s interesting that Sanders has a large number of unemployed supporters, but the current dataset does not give provide much information about them. For instance, I would like to understand the distribution of ages in this group (e.g. are they students?) The dataset as is doesn’t make it easy to dig into these sorts of questions.